parallel perturbation
A Study of Parallel Perturbative Gradient Descent
We have continued our study of a parallel perturbative learning method [Alspector et al., 1993] and implications for its implemen(cid:173) tation in analog VLSI. Our new results indicate that, in most cases, a single parallel perturbation (per pattern presentation) of the func(cid:173) tion parameters (weights in a neural network) is theoretically the best course. This is not true, however, for certain problems and may not generally be true when faced with issues of implemen(cid:173) tation such as limited precision. In these cases, multiple parallel perturbations may be best as indicated in our previous results.
A Study of Parallel Perturbative Gradient Descent
Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papers proposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation [Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.
A Study of Parallel Perturbative Gradient Descent
Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papers proposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation [Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.
A Study of Parallel Perturbative Gradient Descent
Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papersproposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation[Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.